Handling verb phrase morphology in highly inflected Indian languages for Machine Translation

نویسندگان

  • Ankur Gandhe
  • Rashmi Gangadharaiah
  • Karthik Visweswariah
  • Ananthakrishnan Ramanathan
چکیده

The phrase based systems for machine translation are limited by the phrases that they see during the training. For highly inflected languages, it is uncommon to see all the forms of a word in the parallel corpora used during training. This problem is amplified for verbs in highly inflected languages where the correct form of the word depends on factors like gender, number and tense aspect. We propose a solution to augment the phrase table with all possible forms of a verb for improving the overall accuracy of the MT system. Our system makes use of simple stemmers and easily available monolingual data to generate new phrase table entries that cover the different variations seen for a verb. We report significant gains in BLEU for English to Hindi translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bidirectional machine translation in indian languages

This paper, discusses the approach adopted in the development of a bidirectional Machine Translation system for Indian languages. The approach makes use of the characteristics of the languages in simplifying the process of translation. The verbfinal sentence structure and the case-inflected nature of Indian language sentences have led us to adopt a verbcentered approach. The analysis is carried...

متن کامل

Rich morpho-syntactic descriptors for factored machine translation with highly inflected languages as target

The baseline phrase-based translation approach has limited success on translating between languages with very different syntax and morphology, especially when the translation direction is from a language with fixed word structure to a highly inflected language. There are two main points to improve on: morphological translation equivalence and long range reordering. Translating the correct surfa...

متن کامل

English-Latvian SMT: knowledge or data?

In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...

متن کامل

Improving statistical machine translation by classifying and generalizing inflected verb forms

This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen ve...

متن کامل

Evaluating Machine Translation Evaluation’s BLEU Metric for English to Hindi Language Machine Translation

Machine Translation Evaluation (MTE) has been widely recognized by the Machine Translation (MT) community. The main objective of MT is to break the language barrier in a multilingual nation like India. Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European languages due to the language structure. So, there is a great need to develop ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011